NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks

Subramonian, Arjun; Kang, Jian; Sun, Yizhou (December 2024, NeurIPS 2024)

Full Text Available
Strong Model Collapse

Dohmatob, Elvis; Feng, Yunzhen; Subramonian, Arjun; Kempe, Julia (October 2024, 2025 International Conference on Learning Representations)

Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images.
more » « less
Full Text Available
Networked inequality: preferential attachment bias in graph neural network link prediction

Subramonian, Arjun; Sagun, Levent; Sun, Yizhou (July 2024, ICML 2024)

Full Text Available
Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

Subramonian, Arjun; Sagun, Levent; Sun, Yizhou (July 2024, ICML 2024)

Full Text Available
Motif-Driven Contrastive Learning of Graph Representations

https://doi.org/10.1109/TKDE.2024.3364059

Zhang, Shichang; Hu, Ziniu; Subramonian, Arjun; Sun, Yizhou (August 2024, IEEE Transactions on Knowledge and Data Engineering)

Full Text Available
On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Subramonian, Arjun; Chang, Kai-Wei; Sun, Yizhou (January 2022, 36th Conference on Neural Information Processing Systems (NeurIPS 2022))

In human networks, nodes belonging to a marginalized group often have a disproportionate rate of unknown or missing features. This, in conjunction with graph structure and known feature biases, can cause graph feature imputation algorithms to predict values for unknown features that make the marginalized group's feature values more distinct from the the dominant group's feature values than they are in reality. We call this distinction the discrimination risk. We prove that a higher discrimination risk can amplify the unfairness of a machine learning model applied to the imputed data. We then formalize a general graph feature imputation framework called mean aggregation imputation and theoretically and empirically characterize graphs in which applying this framework can yield feature values with a high discrimination risk. We propose a simple algorithm to ensure mean aggregation-imputed features provably have a low discrimination risk, while minimally sacrificing reconstruction error (with respect to the imputation objective). We evaluate the fairness and accuracy of our solution on synthetic and real-world credit networks.
more » « less
Full Text Available

Search for: All records